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Abstract: Collimated streams of particles produced in high energy physics experiments 
are organized using clustering algorithms to form jets. To construct jets, the experimental 
collaborations based at the Large Hadron Collider (LHC) primarily use agglomerative hi- 
erarchical clustering schemes known as sequential recombination. We propose a new class 
of algorithms for clustering jets that use infrared and collinear safe mixture models. These 
new algorithms, known as fuzzy jets, are clustered using maximum likelihood techniques 
and can dynamically determine various properties of jets like their size. We show that the 
fuzzy jet size adds additional information to conventional jet tagging variables. Further- 
more, we study the impact of pileup and show that with some slight modiňcations to the 
algorithm, fuzzy jets can be stable up to high pileup interaction multiplicities. 



1 Introduction 


As the result of a proton-proton collision at a hadron collider, hundreds of particles are 
created and detected [1, 2], While some particles can be identified by tlieir type, súch as 
electrons [3, 4] and muons [5, 6], most of the detected particles are light hadrons produced 
in collimated sprays called jets. Jets are the consequence of high energy quarks or gluons 
fragmenting into colorless hadrons. Experimentally, jets are defined by clustering schemes 
which group together measured calorimeter energy deposits or reconstructed charged par- 
ticle tracks. A jet algorithm is a clustering scheme that connects the measured objects 
with theoretical quantities that can be calculated and simulated. At a hadron collider, the 
natural coordinates for describing particles are pT, y, and <p, where pT is the magnitude 
of the momentum transverse to the proton beam, y is the rapidity, and (j) is the azimuthal 
angle. Particles or calorimeter energy deposits are clustered using jet algorithms based on 
dištance metrics on their coordinates in [pT, p) = {pt, y, In order for a jet algorithm to 
be useful to experimentalists and theorists, the collection of jets should be IRC safe in the 
following sense: 

1. Infrared safe (IR): if a particle i is added with Ip^l —)• 0, the jets are unaffected. 

2. Collinear safe (C): if a particle i with momentum pi is replaced with two particles j 
and k with momenta p j +pk = Pi sneh that \pi— pj\ = 0, then the jets are unaffected. 

The jet algorithms most widely used at hadron colliders fall into a class of schemes known 
as sequential recombination [7]. These IRC safe schemes require metrics d on momenta 
dij = d{pi,pj) : {pi,Pj) —>■ M+, diB = d{pi) : pi —>■ and proceed as follows: 

1. Assign each particle as a proto-jet. 

2. Repeat until there are no proto-jets left: Let (fc,£) = aigvain^ jd{pi,pj) and without 
loss of generality, dks < diB- If dfcs < dke, declare proto-jet k a jet and remove it from 
the list. Otherwise, combine proto-jets k and i into a new proto-jet with momentum 
Pnew — Pi Pk- 


One common preseription is called the Cambridge-Aachen (C/A) algorithm [8, 9], which 
uses dij = \pi — pj\^/B? and diB = 1. The fixed quantity R is roughly the size of the jet in 
(y, (j)). By far, the most ubiquitous jet algorithm used at the Large Hadron Collider (LHC) 
is the anti-/cí algorithm [10] with dij = mm{p^'^-,p^‘^j)\pi — pj^'/R? and diB = PtI- 

The purpose of this páper is to introduce a new paradigm for jet clustering, called fuzzy 
jets, based on probabilistic mixture modeling. Section 2 introduces the statistical concept 
of a mixture model and deseribes the necessary modification to make the procedúre IRC 
safe. Section 3 gives one efHcient method for clustering fuzzy jets based on the Expectation- 
Maximization (EM) algorithm. Section 4.4 contains several examples comparing fuzzy jets 
with sequential recombination and Sec. 5 deseribes how one might mitigate the impact of 
overlapping proton-proton collisions (pileup). We conclude in Sec. 6 with some summary 
remarks and outlook for the future. 
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2 Mixture Model Jets 


Mixture models [11] are a statistical tool for clustering which postuláte a particular class of 
probability densities for the data to be clustered. Generically, for grouping m n-dimensional 
data points into k clusters, the mixture model density is 


m / k 

,.:,Xm\0) = ]J 

i=l \j=l 

where ttj is the unknown weight of cluster j súch that n j = 1 and f(xi\6j) is a probability 
density on n-dimensions with unknown parameters 9j to be learned from the data. A 
common choice for / is the normál density with 9j = for fij the n-dimensional 

mean and 'Ľj the n x n covariance matrix. In the mixture model paradigm, the 9j are the 
cluster properties; in the Gaussian čase, is the location of cluster j and Sj describes its 
shape in the n-dimensional space. When clustering with a hnite mixture, the number of 
clusters k mušt be speciňed ahead of time^, which is dual to the usual use of sequential 
recombination^ in which k is learned and the size of jets is speciňed ahead of time. The 
štandard objective in (frequentist) mixture modeling is to select the parameters 9j which 
maximize the likelihood (Eq. 2.1) of the observed dataset. Figúre 1 illustrates what the 
learned event density might look like for k = 3 and Gaussian / = <I> in n = 2 dimensions. 





Figúre 1. An example of the learned per-particle probability density specified in Eq. (2.1) with 
k = 3 and Gaussian / = $ in n = 2 dimensions. One cluster is associated with each component 
density <í>i = $(• | S^), where the dot • is a placeholder for the function argument. 


^There is a wealth of literatúre on the subject of choosing k, for a survey of methods, see [12]. The 
likelihood monotonically increases with k; as alternatives to maximum likelihood, one can for instance look 
for kinks in the likelihood as a function of k [13]. 

^It is similar to the exclusive form of the kr sequential recombination scheme [14] . The exclusive 
náture of the algorithm (and the minimization procedúre used to find the jets) is similar to the XCone 
algorithm [15, 16] that became public as this manuscript was in its final preparation. 
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An equivalent way of approaching mixture modeling is to view Eq. (2.1) as tlie density 
used to generate the data. We view the data as having been drawn randomly from the 
density specified in Eq. (2.1), with the following setup: 

1. Throw n independent and identical /c-sided dice with probability ti j to land on side 
j = 1,..., k and label the outcomes Ai,..., \n- 

2. Independent of the others, data point i G {1,..., n} is drawn randomly from /(• | 6x^). 

Once 9 and vr are learned by minimizing Eq. (2.1), we can compute qij = Pr(Aí = j | Xj), 
the posterior probability that Xi was generated by /(• | 6j) or, intuitively, the posterior 
probability that Xi belongs to cluster j. The qij are the soft assignments of particles i to jet 
j and will play an important role in Sec. 3 when we show how to maximize the likelihood 
in Eq. (2.1). Jets produced with mixture modeling are called fuzzy jets because of the soft 
memberships - every particle can belong to every jet with some probability^. This can be 
seen explicitly in Fig. 1 where the densities of all three clusters are everywhere nonzero, 
so qij > 0 for all j. The idea of probabilistic membership was recently studied in the 
context of the Q-jets algorithm [19] in which the samé event is interpreted many times by 
injecting randomness into the clustering procedúre. Unlike Q-jets, fuzzy jets allocates the 
soft membership functions deterministically throughout the clustering procedúre. However, 
like Q-jets, there is an ambiguity in how to assign kinematic properties to the clustered 
jets. Fuzzy jets are defined by their shape (and location), not their constituents. This is in 
contrast to anti-kt jets, which are defined by their constituents without an explicit shape 
determined from the clustering procedúre. One simple assignment scheme is to define the 
momentum of a fuzzy jet j as 


Pjet j 




1 j = argmax^gífc 
0 else 


( 2 . 2 ) 


In other words, this procedúre assigns every particle to its most probable associated jet. 
This scheme will be known as the hard maximum likelihood (HML) scheme, but is not the 
only possible assignment algorithm. The dual problém in sequential recombination is the jet 
area, which mušt be defined [20], whereas the jet kinematics are the ‘naturaľ coordinates. 


®Soft assignments for jets during clustering was studied in the context of the “optimal jet finder” [18] 
which maximizes a function of the soft assignments. 
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We now specialize the likelihood in Eq. (2.1) to the čase of clustering particles into jets 
at a collider like the LHC. Consider a mixture model in two dimensions'^ with Xi = pi- The 
resulting mixture model (MM) jets are inherently not IR safe: particle px does not appear 
in the likelihood and therefore arbitrarily low energy particles can influence the clustering 
procedúre. Therefore, we add a modification to the log likelihood: 

m / ^ 

PTílog \^^T^jf{pi\0j) 

where a is a weighting factor. Equation (2.3) is the log of Eq. (2.1) with the term 
inserted in the outer sum. For a > 0, the resulting modified mixture model (mMM) jets 
are IR safe, and when a = 1, the jets are C safe. Therefore, for a = 1, the jets are IRC 
safe. Different choices of component densities / in Eq. (2.3) give rise to different IRC safe 
MM jet algorithms. We háve studied several possibilities for /, but for the remainder of 
this páper will specialize to (wrapped^) Gaussian / = <!>. The resulting fuzzy jets are called 
modified Gaussian Mixture Model jets (mCMM) and are parameterized by the locations 
p, j, the covariance matrices Sj, and the cluster weights VTj. We initialize n j = l/k and 
S, = I. 

Since practical procedures for maximizing the modified likelihood in Eq. (2.3) may con- 
verge to stationary points that are not globally optimal, the output of a fuzzy jet algorithm 
will depend on an initial setting of the cluster parameters 9 and vr. One simple procedúre, 
used exclusively for the rest of the páper, is to seed fuzzy jets based on the output of a 
sequential recombination jet algorithm. This guarantees an IRC safe initial condition and 
therefore the entire procedúre is IRC safe. We now discuss practically how one can find the 
maximum of the fuzzy jets likelihood. 


logC{{pT, i, Pi}\9) = ^ 

i=l 



^One mušt také čare in selecting a class of densities appropriate for the angular quantity (f). For more 
details on the wrapped Gaussian distribution and motivation for its use in this context, see Appendix A. 
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3 Clustering Fuzzy Jets: the EM Algorithm 


One iterative procedúre for maximizing the mixture model likelihood in Eq. (2.1) is the 
Expectation-Maximization (EM) algorithm [21-23]. After initializing the cluster locations 
and prior density vr, the following two steps are repeated: 

Expectation Given the current values of 6j, compute the fuzzy membership probabilities 

Maximization Given qíj, maximize the expected modified complete log likelihood over the 
parameters vr, /r, S. 

The expected modified complete log likelihood has the form 

N k 

EE PTí(%log^(A;Mi,Sj) + gýlogvrj). (3.1) 

i=i j=i 

Note that the expected modified complete log likelihood is not the samé as the expected 
modified log likelihood, shown in Eq. (2.3). They differ in that the complete log likelihood 
has the second sum outside the logarithm while Eq. (2.3) has the sum inside the logarithm. 
The power of the EM algorithm is that maximizing the complete log likelihood results in 
fixed point iteration to monotonically improve the originál log likelihood. This desirable 
property of the EM algortihm is still tŕne when a > 0; for a proof, see Appendix B. Many 
choices for / háve closed form maxima for the M step; in the Gaussian / = d> čase outlined 
above, the updates are given by 


/t ii ^ it 

Pj ='12 -ÍT '^PTiQij, (3-2) 

í=i i=i 1 ^í=iPtí 

where qij = QijPTi/ Y17=iPljPTľ 3^^® well-known A:-means clustering algorithm [24] can 
be recovered as the limit of expectation-maximization in a Gaussian mixture model with 
S = cj^ —>• 0. Figúre 2 illustrates GMM clustering using the EM algorithm with k = 2 
clusters. The EM algorithm readily accommodates constraints on the model parameters. 
One constraint we will consider throughout the rest of the páper is Ej = Ujl for all j, 
which requires the curves of constant likelihood in (y, </>) to be circular. We will see in the 
next section that the learned valne of aj is useful for distinguishing jets originating from 
different physics processes. 


- 5 - 




Figúre 2. An illustration of of the EM algorithm for k = 2. The circles represent data points, the 
triangles represent the estimated cluster locations fj,j, and the ellipsoids are equidensity contours 
describing the shapes of the learned cluster distributions. In the E-step, bluer colors correspond 
to higher valne of Pí,biue jet- 
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4 Comparisons with Sequential Recombination and Jet Tagging 


This section describes some numerical comparisons between sequential recombination and 
fuzzy jets. Section 4.1 summarizes the simulation details with some first event displays 
showing both fuzzy and sequential recombination jets. These two approaches to jet clus- 
tering are studied over an ensemble of events in Sec. 4.2. A third subsection, Sec. 4.3, 
illustrates that fuzzy jets captures new information about the hadronic hnal state, and in 
the fourth section, Sec 4.4, it is demonstrated that this new information can be used to 
classify the jet type. 

4.1 Details of the Simulation 

In order to study fuzzy jets in a realistic scenario, we run Monte Carlo (MC) simulations. 
Three physics processes are generated using Pythia 8.170 [25, 26] at y/s = 8 TeV. Hadronic 
W boson and top quarks are used for studying hard 2- and 3-prong type jets, respectively. 
To simulate high pT hadronic W decays, W' bosons are generated to decay exclusively into 
a W and Z boson which subsequently decay into quarks and leptons, respectively. The 
Pt scale of the hadronically decaying W is set by the mass of the W' which is tuned to 
800 GeV for this study so that the p^ < 400 GeV. In this p'^ range, the W decay products 
are expected to merge within a cone of iž 1.0 where = /S.cfP' + /S.rf ~ / Pt w ' ^ 

sample enriched in 3-prong type jets is generated with Z' —)• ŕí, where the Z' mass sets the 
energy scale of the hadronically decaying top quarks. In this analysis, we use mz' = 1-0 TeV, 
which sets plp > 500 GeV. To study the impact on signál versus background, QCD dijets 
are generated with a range of px that is approximately in the samé range as the relevant 
signál process. In all distributions, the QCD px spectrum is weighted to exactly match that 
of the signál to control for differences between signál and background due only to the px 
spectrum differences. Pileup is simulated by overlaying additional independently generated 
minimum-bias interactions with each signál event. For the rest of this section, the number 
of pileup interactions npu = 0. See Sec. 5 for studies of npu > 0. 

For a comparison to fuzzy jets, suti-kt jets are clustered using FastJet [27] 3.0.3. 
The signál processes are chosen súch that jets with rádius parameter R = 1 are most 
appropriate in capturing the decay products of the heavy particles. The anti-fci jets are 
trimmed [28] by re-clustering the constituents into R = 0.3 kt subjets and dropping those 
which háve < 0.05 x . Anti-fcj jets are also used to seed the fuzzy jet clustering; 

the Pt threshold for this initialization is 5 GeV^, and the impact of this choice is studied 
in Appendix C. 


®This low threshold guarantees that there are enough seed jets around to capture the radiation from the 
underlying event. Another strategy could be to use the Event Jet (see Sec. 5) even when there is no pileup. 
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To model tlie discretization and Hnite acceptance of a reál detector, a calorimeter of 
towers with slze 0.1 x 0.1 in (y, (p) extends out to y = 5.0. The total energy of the simulated 
particles incident úpon a particular celí are added as scalars and the four-vector pj of any 
particular tower j is given by 


P j = E Ei (cos (j)j / cosh y j, sin (f)j / cosh y j , sinh y j / cosh y^, 1). (4.1) 

i incident on j 

To simulate a particle flow reconstruction, the sum in Eq. (4.1) contains only neutrál par¬ 
ticles for |y| < 2.5 and both charged and neutrál particles for 2.5 < |y| < 5. Charged 
particles within |y| < 2.5 are individually added to the list of inputs for clustering, unless 
they originate from a pileup collision. Anti-/cí jet momenta are corrected for pileup on av- 
erage using area subtraction [20]. The medián pileup density, p, is estimated by clustering 
hard scatter particles, neutrál pileup particles, and charged pileup particles in the range 
|y| < 2.5 using kt R = 0.4 jets in FastJet with ghosted areas. 

A representative event display for a Z' ^ Ú event is shown in Figúre 3. The top right 
plot in Figúre 3 shows the anii-kt jets with pT > 5 GeV as filled in (partial) circles. The 
filled area is determined by the jet area and there are deviations from circles only one a 
low pt jet is close to a higher pT jet. The two top quarks are depicted as red stars, each 
of which sits at the center of two high px jets. The top left plot in Figúre 3 shows mGMM 
fuzzy jets. The fuzzy jets are depicted by their l-cr contours. In contrast to the anti-fcí jets, 
fuzzy jets vary widely in radial size. Gray crosses in the top left plot indicate the locations 
of the anti-feí jets shown in the top right plot. The long tail of the crosses point toward 
the fuzzy jet for which they were the seed. The two jets closest to the top quarks did not 
move a long dištance from the seed location, though the size did change significantly from 
R = 1. The lowest pT fuzzy jet moved a long dištance from the seed to the hnal location. 

Another new feature of fuzzy jets compared to anti-fc^ jets is that they can overlap 
with each other. This is seen by the four jets with overlapping 1-a contours in the top 
left plot of Figúre 3. Overlapping mGMM jets are an expression of structure inadequately 
captured with a single Gaussian shape. The ability to learn features at different scales in 
the samé event without relying on a size parameter like the anti-kt rádius parameter can 
give mGMM fuzzy jets additional descriptive power over anti-kt and other traditional jet 
algorithms. This particular event will be used again for reference in Section 5 during a 
discussion on the performance of the technique in the presence of pileup interactions. 

4.2 Kinematic Properties of Fuzzy Jets 

Jets clustered according to the mGMM algorithm capture similar hard jet locations and jet 
energy (under HML) as those clustered by anti-fe* iž = 1. In Figúre 4, the pT distribution 
for the highest pT jets for three different physics processes are plotted as given by anti-fct 
iž = 1.0 and mGMM jets. The anti-fct pT distributions are re-weighted so that all three 
processes háve identical distributions in the left plot. On the right, the distributions are 
in good correspondence with those in the left plot, though there is a slight shift of the 
peak. Additionally, the (y, (p) locations of the highest pT mGMM jets are in excellent 
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Figúre 3. A representative event display for a, Z' —>■ tt event. In the top left plot, gray circles show 
the location and size of mGMM fuzzy jets after clustering, with the size of the circle indicating 1-a 
contours in the detector; the black circle indicates the highest pt jet with HML particle assignment. 
The small filled colored circles are the particles, with the color and size indicating their energy. In 
each čase, the events háve been rotated in (f> to plače the truth top quark at (jj = "^-k which is 
indicated by a red star. Anti-A:* jet locations are shown with gray crosses in the left hand plot, the 
long tail of which points towards the mGMM jet for which it was a seed. In the top right plot, 
anti-fct R = 1.0 jets passing a 5 GeV pt cut are shown as discs under the particles indicating their 
active area, with centers the samé as the crosses in the left hand side. Shades of gray in the anti-fej 
discs háve no scale and are meant to aid the eye, but go from low px (lighter) to high px (darker). 


correspondence with the locations of the anti-fej jets as was already discussed in reference 
to Figúre 3. 

The mGMM algorithm differs from the anti-fcí algorithm in how the size and structure 
of clustered jets. This was already shown qualitatively in Figúre 3: fuzzy jets come in a 
variety of sizes, and can overlap in complex ways. The matter is further complicated by the 
choice of particle assignment scheme for dehning kinematic properties in the mGMM family 
of algorithms. The catchment area’s vohime and shape of a fuzzy jet depends in generál on 
the full set of learned jet locations and model parameters, S. In contrast, for anti-fet jets, the 
catchment area is bounded from above by R and is only smaller when another high px jet 
is nearby. The nonlocality of the mGMM clustering model can be observed quantitatively 
by examining jet mass, given in Eq. (4.2), which is sensitive to the distribution of energy 
within a jet. The jet mass distributions for both mGMM (HML assignment) and anti-fct 
jets are shown in Figúre 5, with the samé pp weighting as in Figúre 4. Even though fuzzy 
jets learn the samé core (i.e. pp) for jets as anti-Zcí, they do not learn the samé mass. The 
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Figúre 4. The jet px for the leading anti-fej jet (left) and leading fuzzy jet under the HML particle 
assignment scheme (right). AU the processes are re-weighted so that the anti-fej px spectra are the 
samé. 


white dashed lines in Figúre 5 mark the locations of the W boson and top quark masses 
at about 80 GeV and 175 GeV, respectively [29]. For both anti-fe* and fuzzy jets, there 
are clear peaks at the W mass for the boosted W —)• qq' from W' simulated events and 
at the top quark mass for Z' —)• Ú simulated events. However, there are clear differences 
in the shape of these distributions. The W mass peak for W' events is more peaked for 
fuzzy jets, though there is also a low-mass contribution to the distribution. For Z' events, 
the top quark mass peak is less populated for fuzzy jets, which instead has shifted events 
to the W mass peak. This often happens when the tree-prong structure is learned by 
two (overlapping) fuzzy jets. The QGD multi-jet jet mass distribution is also qualitatively 
different between fuzzy jets and anti-feí jets, with the former shifted to lower values of the 
mass. 



(4.2) 


4.3 New Information from Fuzzy Jets 

The properties S of a fuzzy jet can be useful in distinguishing jets resulting from different 
physics processes. In the simplest realization of mGMM jets already described above, 
S = where u is a measure of the size of the core of a jet. Although cj is a simple 
variable to construct from the wealth of data available after clustering with the mGMM 
algorithm, it captures at least some of the schematic differences in the likelihood for Z' —í- ti 
and W' -^WZ relative to a QGD multijet background (shown below). 

The left plot of Figúre 6 also shows the average a over all fuzzy jets in an event. The 
generic fuzzy jet is rather independent of the physics process and tends to be quite large. 
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Leading anti-A:^ Jet Mass [GéV] Leading mGMM Jet Mass [GeV] 

Figúre 5. The jet mass for the leading taúA-kt (left) and leading fuzzy jet under the HML particle 
assignment scheme (right), in an anti-fej leading jet pt window of 350 to 450 GeV. AU the processes 
are re-weighted so that the anti-fct pT distributions are the samé. The dashed white lines mark 
mw = 80.4 GeV and rut = 173.3 GeV. 


This is because fuzzy jets capturing hard radiation tend to be small, but most of the fuzzy 
jets needed to capture the sparse radiation pattern from the undeiiying event need to be 
large. In contrast, the a for the leading mGMM jets are shown the right plot of Figúre 6 
for each of the three physics processes. As expected, the decay relative size of the highest 
Pt jets depends on the physics process. For the decay of a boosted heavy particle with 
mass m and pp, the radial size of the decay products scales as 2m/pp and thus since the 
Pt distribution in Figúre 6 is fixed, one would expect that the top quark jets háve a larger 
a than the W boson jets, which are in turn larger than the quark and gluon jets. This is 
reflected® in the three peaks in the left plot of Figúre 6. The separation between the three 
physics processes it not 100% correlated with the naive scaling m/pp of the corresponding 
leading anti-Zcj jets. Figúre 7 shows that there is a strong positive correlation between 
(T and the corresponding anti-/cí mass over pp as expected. There are two peaks in the 
correlation for the Z' —)• tt events because the anti-fcí mass spectrum has peaks at both the 
top mass, and the W boson mass. While the correlations between the fuzzy jet a and the 
anti-fcí m/pp are non-negligible, they are far from unity and thus there may be additional 
information contained in the fuzzy jet a that is useful for tagging the flavour of a jet. 


®At leading order, there is an exact relationship between a and the jet mass 


See Appendix D. 
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Figúre 6. The learned valne of u for the highest 
jets (right) for various physics processes. 
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Figúre 7. The left and right plots show the correlation between a and the leading jet anti-A:^ mass 
divided by px in an anti-A:t px window of 350 to 450 GeV for Z' tt and QCD events, respectively. 
Indicated in the lower right of each figúre is the linear correlation between the variables. 
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4.4 Fuzzy Jets for Tagging 


In this section, a is compared with another class of jet substructure variables known to be 
useful for tagging: tlie A^-subjettiness ratios [30]. A^-subjettiness moments are defined over 
a set of N axes^, and calcnlated as: 

TN = \- min{Aiži fc, AiŽ2,fc, ■ • • (4.3) 

* i 

where do is the normalization 

do = ^PT.fcdžo, (4.4) 

k 

and Rq is tlie rádius of the jet. In practice, the useful variables for determining how múch 
more Apronged a jet is compared to j-pronged are the A^-subjettiness ratios: 

Tij = J. (4.5) 

Tj 

The variable r 2 i is often used for the separation of W from QCD jets [31, 32] and is a 
measure of the compatibility of a jet with a 2-prong hypothesis compared to a 1-prong 
hypothesis. Low valne of r 2 i indicates that the jet likely has a 2-prong structure. Similarly, 
r 32 is useful for top tagging in that it measures whether a 3-prong structure is a better 
description of a jet relative to a 2-prong structure. 

The rest of his section contains comparisons of the performance of a relative to r 2 i 
for separating W from QCD jets, as well as a relative to r 32 for tagging Z' —í- tt amongst 
a QCD jet background. In Figúre 8, a fc-nearest neighbors classifier was trained with 2- 
fold cross validation in TMVA [33]. The left plot in Figúre 8 demonstrates an increase in 
performance for discriminating Z' —)• ti from QCD relative to using r 32 alone. The fuzzy 
jet a is roughly equally useful to the A^-subjettiness ratio at a sigma efíiciency of 0.85, and 
using both variables greatly improves background rejection. Similar results can be seen 
in the right plot of Figúre 8, where a boosts background rejection relative to r 2 i alone. 
In each čase, the training and classification was performed in a mass window around the 
particles of interest, the top quark mass in the Z' —í- ti sample and the W boson mass to 
discriminate W —)• qq' from QCD. 

The comparisons of the fuzzy jet a and Ai-subjettiness are intended to be an illustrative 
example. As discussed in the opening of this section, a is just one variable that can 
be constructed by using mCMM clustered jets. Expanded studies of the various learned 
parameters could come up with additional variables, or the full learned parameter set could 
be thrown into an off the shelf classifier or machine learning model. 


^We use the “one-pass” kt axes optimization technique, which uses an exclusive kt algorithm to find N 
axes and then refines them by minimizing the A-subjettiness valne. 
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Top Quark Efficiency 



Figúre 8. The tagging performance of a relative to T 32 (t 2 i) for distinguishing top quarks {W 
bosons) from a QCD background is shown on the left (right). The random tagger keeps a fixed 
fraction of all events, regardless of their origin and is a lower bound on the performance of any 
tagger. 
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5 Underlying Event and Pileup 

As with any new jet algorithm or jet variable, understanding the effect of pileup vertices 
from additional proton-proton collisions is essential to make meaningful statements about 
how the method will be applicable to reál data analyses at the LHC. Studying pileup in 
the context of mGMM jets is complicated by the effective catchment area of the jets. For 
hierarchical-agglomerative algorithms like anti-/cí, the catchment area scales with the rádius 
parameter. However fuzzy jets can háve infinite catchment area because the likelihood for 
particle membership is nonzero for any hnite dištance and arrangement of Gaussian jets 
and particles. Furthermore, the catchment area can change depending on the other jets in 
an event. Although this effect also occurs in the hierarchical-agglomerative čase, the effect 
is múch more pronounced in the mGMM clustering algorithm, with some jets having hnite 
catchment areas while others cluster inhnite area. 

The challenge of pileup for fuzzy jets is illustrated in Figúre 9, where the samé event 
is shown with npu = 0, and with npu = 40. The event displays show the centrál región of 
the detector, where most of the decay products of the hard scatter lie. Qualitatively, it can 
be seen that the introduction of additional interaction vertices broadens all of the mGMM 
jets. This broadening clearly impacts the power of a for differentiating QGD background 
from signál processes. 


Pythia 8 npu = 0 Pythia 8 npu = 40 
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Figúre 9. mGMM jets deňned according to Section 2 with an isotropic kernel are broadened as 
a result of the introduction of additional pp pileup vertices. The samé hard scatter is clustered 
twice, on the left with npu = 0 and on the right with npu = 40. Vertical dashed lines at 77 = ±2.5 
show the extent of a simulated tracker with the samé rj extent as that used at ATLAS and CMS. 
Charged pileup falling within the extent of the simulated tracker is discarded before clustering and 
the aggregation of particles into towers. 

The next sections explore two methods for mitigating the impact of pileup in relation 
to fuzzy jets, illustrated with the variable cr. 
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5.1 Changing a for Pileup Suppression 

In Section 2, it was discussed that choosing a = 1 in tlie likelihood (Eq. (2.3)) guarantees 
IRC safety. With a = 1, tlie mGMM algoritlim treats liard structure and soft structure 
linearly in the particle or tower pT- However, one can exploit tlie fact that a is dispropor- 
tionately a measure of tlie shape and extent of the leading jet hard structure to make the 
variable more resilient to the effects of pileup. In particular, choosing a > 1 stabilizes a 
at high npu because so long as the average input particle pT dne to pileup is significantly 
smaller than the pT of the particles constituting the leading jet hard structure, the change 
in likelihood will be suppressed roughly according to (pT,hs/PT,Pu)°'- An example of this 
effect is illustrated in Figúre 10, which shows the samé event as in Figúre 9. The price for 
adjusting a is the loss of collinear safety. Varying a is not explored further, as Section 5.2 
demonstrates a method for dealing with pileup effectively that does not rely on moving o. 
away from the IRC safe valne of one. 


a = 2 

Pythia 8 npu = 0 


a = 2 

Pythia 8 npu = 40 
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Figúre 10. Clustering in the mGMM model with a = 2. There is little broadening between the 
npu = 0 (left) and npu = 40 (right) cases, but jets at the locations of the tops in the event are 
substantially narrower than in the čase where a = 1, even with npu = 0 (compare to Figúre 9). 
Under the ML particle assignment, the a = 2 algorithm identifies the other top as the highest pt 
jet in the event, demonstrating the difhculty in dealing with fuzzy jet kinematics. 


5.2 Tower Subtraction and the Event Jet: Effective Pileup Correction 

Recent developments in pileup mitigation háve led to several algorithms for correcting jet 
inputs before jet clustering beings. Sneh techniques include Pileup Per Particle Identifica¬ 
tion (PUPPI), Constituent Subtraction, and SoftKiller [34-36]. One simple input-correction 
scheme is to subtract from each calorimeter tower the estimated pileup pp density per unit 
area multiplied by the size of the tower in the detector. As a first step, p is calculated in 
the samé way as deseribed in Sec. 4.1. Tower momenta are then corrected according to 
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Eq. (5.1), where pT,s is tlie corrected momentum, pT^o is the originál momentum, and A is 
the area of the tower. In this study, all towers háve area 0.1 x 0.1 in y-([) space. 


PT,s = max {pT,o - p A, 0). (5.1) 

While subtracting the average pT background from towers before clustering is a relatively 
safe way of reducing the effect of pileup, at least when the pT scales of the tower to tower 
fluctuations are small compared to the hard scatter pT scale, it would still be helpful to 
systematically address the question of catchment areas. The mGMM clustering algorithm 
provides a natural framework in which to think about pileup, however, because the algo¬ 
rithm deals fundamentally with likelihoods, and the pileup likelihood is to leading order 
uniform over the detector (this is the motivation for the area-subtraction technique). This 
is the motivation for modifying the mGMM likelihood using a technique we call the event 
jet. 

In addition to learning k mGMM jets throughout clustering, the event jet includes 
another background contribution to the likelihood which attempts to capture the intuition 
of a uniform contribution of particle likelihood dne to pileup. Gonstraints are further 
imposed on the likelihood on the event level jet so that it has constant likelihood during 
the clustering process, making the necessary modifications to the algorithm procedures 
simpler. 

Practically, the effect of the event jet can be parameterized through the introduction 
of an algorithmic parameter 7 . Particle membership probabilities change according to 
Eq. (5.2) with corresponding changes to the analytical M step for the Gaussian kernel type. 
The choice of 7 is important, and it should reflect the fact that not all events are created 
equal in the sense that not all events háve the samé contributions dne to pileup. Although 
there is no strict way of dealing with this issue, it is reasonable to replace 7 by a meaningful 
combination of parameters which is sensitive to our estimates of the amount of pileup in 
a particular event. We háve chosen to také 7 = pA'jy^ where p is our estimate of the pT 
density dne to pileup, A is the calorimeter area, and 7^0 is a parameter of the algorithm 
controlling the strength of the event jet. Initial studies with the event jet indicate that 
introducing a p dependent 7 is múch more effective than a p independent one. 


Qij 


Qij 

7 + EfcPífc 


(5.2) 


Studies of the pileup conditions similar to LHG Run I, with ~ 20 pileup interactions, 
indicate that with a 5 GeV pT cut, = 0.01 provides reasonable stability of the learned a. 
This is demonstrated qualitatively in Figúre 11, in which the tower and event jet corrections 
are applied to the samé event shown in Figúre 3 at both npu = 0 and npu = 40. Unlike 
any of the methods discussed previously, this method for correction maintains IRG safety, 
demonstrates very little jet broadening at npu = 40, and is not drastically different in 
its qualitative features by comparison to the štandard mGMM algorithm. Note that the 
assignment of towers to jets under the HML scheme is impacted with the event jet because 
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Corrected Corrected 

Pythia 8 npu = 0 Pythia 8 npu = 40 
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Figúre 11. Jet correction using tower subtraction and the event jet with parameter 7 ^, = 0.01. 
The two leading pT jets are almost identical in size in the left and right insets, which show the 
npu = 0 and npu = 40 cases respectively. Although many of the other jets change (including 
the migration of jets to higher |ry| as a result of the simulated tracker), those that give the a and 
sub-leading a variables are insensitive to the effect of pileup. 


many towers belong to the event jet with higher probability than any of the other fuzzy jets. 
To preserve tower-to-jet assignments under pileup, a smaller valne of 7 ^, should be chosen. 
The event jet is useful instead because it changes the dynamics of clustering, making jets 
less sensitive to soft radiation far away from the jet axis during the EM update steps, and 
therefore increasing the stability of the hard core that is eventually clustered. 

A quantitative study of the pileup mitigation suggested qualitatively by Figúre 11 
requires an ensemble of events. Figúre 12 shows how the mean and štandard deviation of 
learned a evolve with npu- The uncorrected a is shown in red downward pointing triangles 
while the tower subtraction and event jet corrections are shown in blue upward pointing 
triangles. For both Z' —)• ti and QCD, the pileup dependence is dramatically reduced with 
the tower subtraction and the event jet. The uncorrected mean a increases as a function of 
npu as all of the fuzzy jets become the samé size. The štandard deviation of the uncorrected 
(T actually decreases beyond npu ~ 5 as all of the fuzzy jets become the samé size. For 
modest levels of pileup, tower subtraction and event and the event jet maintain the mean 
and štandard deviation of the a distribution. 
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Figúre 12. For both QCD and Z' — )■ Ú samples, using the pileup correction (blue triangles) via the 
event jet and tower subtraction stabilizes the mean relative to the uncorrected samples (red inverted 
triangles), and prevents widening of the cr distribution in pileup conditions somewhat worse than 
during Run 1 at the LHC. 


19 - 






6 Conclusions 


The modified mixture model algorithms provide a new way of looking at whole event struc- 
ture. In contrast to the usual uses of hierarchical-agglomerative algorithms like anti-Zcj, the 
number of seeds is ňxed ahead of time and their properties are learned during the clustering 
process. The learned parameters provide a new set of handles for distinguishing jets of dif- 
ferent types. Even simple variables constructed out of the learned parameters of a mixture 
of isotropic Gaussian jets, like a, offer complementary Information to the n-subjettiness 
variables T 21 and T 32 for tagging W boson and top quark jets. Even though the variable a 
is sensitive to changes in pileup conditions, small modiňcations to the fuzzy jets algorithm 
- correcting jet inputs and adding a pileup likelihood - can mitigate the impact of pileup. 

Fuzzy jets are new paradigm for jet clustering in high energy physics. These IRC safe 
likelihood-based clustering schemes set the stage for many possibilities for future studies 
related to jet tagging, probabilistic clustering, and pileup suppression. 

7 Acknowledgments 

We would like to thank Jesse Thaler for useful discussions and helpful feedback on the 
manuscript. In addition, we thank Gavin Salam for useful comments on the algorithm 
description. This work is supported by the US Department of Energy (DOE) Early Career 
Research Program and grant DE-AC02-76SF00515. BN is supported by the NSF Graduate 
Research Fellowship under Grant No. DGE-4747 and by the Stanford Graduate Fellowship. 



A Wrapped Gaussian 


In tlie EM algorithm described in Sec. 3, tliere are explicit (and implicit) dependencies on 
tlie topology. For instance, if a Gaussian density is used to model (j), then, in the E step, a 
particle with cjíi near 27r will be deemed far from a cluster witli location ej)j near 0. To avoid 
this undesirable behavior and enforce tlie equivalence of the angles 0 and 27r, we associate 
(j) witli a wrapped Gaussian density and y with a štandard Gaussian density: 




a 


\/2 


vrer^ 


oo 

"""P 


I=—oo 


2(72 


(A.l) 


where is a normál distribution and = n(j, + 2TľI. In order to approximate the 

sum in Eq. (A.l), we také only the leading contribution by choosing for I* = 

argminj/|(/> — + 27r/'|. As other contributions are exponentially suppressed, this is a 

good approximation and recovers continuity near 0 and 27r. Figúre 13 illustrates the im- 
proved clustering behavior that results when (p is modeled using the wrapped Gaussian 
approximation in plače of the štandard Gaussian density. 
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Figúre 13. A three-particle event display illustrating the results of fuzzy jet clustering using a 
Gaussian density for p (left) and a wrapped Gaussian density approximation for p (right). 
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B The EM algorithm 


This appendix contains two derivations: the modified EM algorithm updates in Eq. (3.2) 
and the proof that the modiňed EM algorithm generically improves the originál modiňed 
log likelihood Eq. (2.3) with every iteration. Recall the expected modiňed complete log 
likelihood (mmCLL) from Eq. (3.1): 


n k 

EE Ptí {(lij log$(pí; Pj, Sj) + qij logTTj). 

i=i j=i 

Viewing the mCLL as a fnnction of p, S and vr for fixed A and p we can maximize. For vr, 
we optimize 


n k ( ^ 

'^^PTí (Qij logTTj) + A ^ TTj - 1 

i=l j = l \j=l 

where the last term is needed so that the optimal vr* is a probability. The derivative of this 
expression with respect to n j is 



1 "" 

= -^'^PŤiQij^ 
2=1 


and then snmming the eqnation over j and nsing ^ij ~ ^ constraint eqnation 


vr,- 


V” n" J Ptí^í j 

1^í=iPtí j=i 


The npdates for p and S follow from the štandard derivation (by similarly taking derivatives 
of the mCLL with respect to components of these mnlti-dimensional objects) by noting that 
the only difference is that q^j i—)• qijP^^ and there are no Lagrange mnltipliers needed nnlike 
for TT*. 

Finally, we prove the claim that the modified EM algorithm described in the body of 
the text monotonically improves the modified log likelihood in Eq. (2.3). First, we note 
that we can rewrite the (log) likelihood as 
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PT^ogp{p\9) =p'^\og\ ^ p{p,X;0) 


=PŤ log I 

\p{p,X;e) 


q{X)p{p,X;9) 


qW 


= Pt logEg 
Pt log 


> Eg 


qW 

p{p,>^-,0) 

qW 


= Ciq,9), 


where the inequality in the last line follows from Jensen’s inequality. Now, we are ready to 
prove the claim that p^p{p\9^^)) improves monotonically with t, the index for the iteration 
of the EM algorithm. First, note that 


C{q,9)=Eg 


Pt log 


A;6>) 

qW 


= Eg [p^ log {p{p, X; 6»))] - Eg [pf. log (g(A))], 


where the first term is the mCLL and the second term has no 9 dependance and so maximize 
C{q,9) over 9 is equivalent to maximize the mCLL over 9. Therefore, < 

By the inequality above, < p^p{p\9^^~^^^). The E step can 

be recast as choosing 


gr(í+i)(A. = j) = q-j( 0 (*')) =EQ(t)[qij] =p(A|p,6lW). 

This enforces: 


£(p(A|p,0W),0W) = 

®lp(A|p,0(O) 



/ p(p,A;gW) \ 

l^p(A|p,0W)^ 


= E 


'p{\\p,d(P) 


P? log 


= Pt log 



Putting this together with the bounds from the M step, we arrive at the desired result: 

l-o-) every step of the modified EM algorithm improves or 
leaves the samé the originál likelihood. 
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c Controlling Jet Multiplicity with pt 


In contrast to most uses of hierarchical-agglomerative clustering algorithms, tlie number of 
fuzzy jets is fixed before clustering begins. Whereas a single traditional jet can reasonably 
be considered to correspond to a partou in appropriate cases, mGMM jets should not be, 
as several mGMM jets can togetlier express structure of what would be one or several 
jets according to anotlier algorithm. The choice of tlie number of jets used in mGMM 
jet clustering tlierefore Controls tlie expressive power of tlie algorithm to look at tlie event 
structure. In practice, choosing too many jets does not greatly affect the valne of tlie leading 
learned a variable, because the additional jets learn finer features of the event structure. 
On the other hand, choosing too few jets is often problematic as can be seen in Figúre 14 
- the fuzzy jets need to grow in order to cover the full energy distribution in the event. 
Using anti-fcí jets as seeds for fuzzy jets has the feature that the number of fuzzy jets 
change dynamically with the complexity of the event. The algorithm is not very sensitive 
to the exact locations of the anti-fcí jets - studies which randomly perturbed the initial jet 
locations inside a disc of rádius 1.0 found that a was robust to sneh fluctuations, even on 
an event by event basis. However, the pT threshold for the seed anti-fct jets can háve a 
significant impact on the fuzzy jets as this alters the number of seeds. The pT threshold for 
the anti-/cí seeds is typically lower than the pT threshold one would use to consider anti-fct 
jets alone because the fuzzy jets algorithm needs enough seeds to populate the low energy 
regions of the detector. One way of mitigating the impact of the pT eut on the fuzzy jet 
clustering is to introduce an event jet, deseribed in Section 5.2. 


Pythia 8 npu = 0 Pythia 8 npu = 0 
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Figúre 14. Changing the choice of the pt eut used to select seeds can make a vast difference in the 
values of the constructed variables, like a. In this event, clustered on the left with a eut of 5 GeV 
resulting in five jets, and on the right with a eut of 50 GeV resulting four jets. Fewer degrees of 
freedom in the four jet čase nieans a múch larger learned value for the a variable. 
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D A Leading Order Description of Fuzzy Jet a 

We háve seen in Sec. 4.4 that the fuzzy jet a is correlated with p = m/pT- We can build 
some intuition for this relationship by considering a leading order QCD calculation of a. 
Consider an isolated quark jet with energy E which radiates a gluon with angle 0 <C 1 from 
the jet axis and with energy fraction z <C 1. Without loss of generality, suppose the quark 
is moving in the (j) = 0 direction and the splitting happens in the (j) = irj2 direction so 
that the four vector of the quark is = E{1 — z)(l,0,0,1), and the gluon four-vector is 
= Ez{l, 9, 0,1), to leading order. To this order, the jet mass is simply m = EzO“^. What 
is (T? Consider k = 1 and something like the event-jet applied so that we can treat this jet 
in isolation from other hadronic activity in the event. Since A: = 1, the soft memberships 
are all one, i.e., qn = 1 and there is only one step of the EM algorithm. The anti-Zci jet 
has (y, 4>) coordinates (0, 9), which could be used for the seed, but since A: = 1, the seed is 
not used. The quark has coordinates (0,0), and the gluon has coordinates (0,0). We can 
compute the fuzzy jet coordinates in the (single) M step: 


Py =0 

0 x E(1 — z) + 9 x Ez ^ 

E{l-z) + Ez 

2 _ (0 - Z0)2 X E{1 -z) + {9- z9f x Ez 
^ “ 2{E{1 - z) E Ez) 

= z02 +0(02^2). 

Therefore, to leading order and A: = 1, the learned a is the jet mass. For k = 2, there 
are enough degrees of freedom to resolve the substructure of the hard splitting and so the 
relationship between the jet mass and a breaks down. 


(D.l) 

(D.2) 

(D.3) 

(D.4) 
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